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1. INTRODUCTION 


Recently, there has been considerable interest in the development of systems 
for the classification of remotely sensed imagery data for inventorying 
natural resources, monitoring crop conditions, etc. Usually, the inherent 
classes in the data are multimodal, and nonsupervised classification or clus- 
tering techniques (refs. 1, 2) have been found to be effective (ref. 3). 

These usually break up the image into its inherent modes or clusters. One of 
the crucial problems in the application of clustering techniques for the clas- 
sification of imagery data is to label the clusters. 

There is considerable interest in the statistical literature in labeling the 
clusters (ref. 4). This problem is also common in labeling the regions, 
obtained by using segmentation algorithms, in the development of scene under- 
standing systems. In the recent literature, relaxation labeling algorithms 
(refs. 5, 6, and 7) have been proposed for labeling the segmented regions, but 
these use relational or spatial properties of the regions through compatabil- 
ity coefficients. However, in cluster labeling, the relational properties of 
the clusters are either nonavailable or not meaningful. For example, in aero- 
space imagery, the regions of interest are crops, nonagricultural areas, etc. 
and can be anywhere in the image; hence, it is difficult to define relational 
properties . 

It is the purpose of this paper to address the problem of labeling the 
clusters using the information from a given set of labeled patterns. It is 
assumed that the probability density functions and a priori probabilities of 
the clusters or modes are given. Let these respectively be p(n = i|X), 6., 
i = 1, 2, •••, m where m is the number of modes or clusters. It is also 
assumed that a set of labeled patterns, X-j(j) with labels w..(j) = i; 

J = I. 2, •••, N 7 -; i = 1, 2, C are given, where C is the number of 
classes. In remote sensing, the labels for these patterns are provided by 
analyst interpreter (AI) by examining imagery films and using some other 
information such as historic information, crop calender models, etc. Very 
often, the AI labels are imperfect. It is relatively expensive to acquire 
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labels, and a large number of unlabeled patterns is usually available. Some 
approaches that use all the given information are presented in this paper for 
optimum cluster labeling. 

The paper is organized as follows. In section 2, the problem of obtaining 
optimum class labels to the modes is formulated as the one that maximizes 
likelihood criterion by exhaustive search over all possible label assignments. 
Section 3 considers the problem of obtaining probabilities of class labels to 
the clusters using maximum likelihood criterion. A closed-form solution that 
maximizes a lowerbound on the criterion is presented in section 3. Also, 
expressions for the asymptotic mean and variance of resulting proportions are 
presented. In section 4, probability of correct labeling is used as a criter- 
ion for obtaining probabilities of class labels for the modes. In section 5, 
variance of the class proportion estimates is proposed as a criterion that 
uses both the given labeled and unlabeled pattern sets for obtaining the prob- 
abilities of class labels to the modes- Imperfections in the labels of the 
given, labeled set are considered in section 6. Section 7 contains the exper- 
imental results in the processing of remotely sensed imagery data and a con- 
cluding summary is given in section 8. The problem of grouping modes into 
their natural classes using unlabeled patterns is considered in appendix A. 
Appendix B considers a two-class problem In which the labeled patterns from a 
single class and a set of unlabeled patterns- are given. Fixed point iteration 
schemes for the probability of correct labeling criterion are presented in 
appendix C. Appendix D addresses the problem of proportion estimation with 
impure clusters. 


2, LABEL ASSIGNMENT TO CLUSTERS BY EXHAUSTIVE SEARCH 


In general, the class-conditional density functions are multimodal. Let C-s be 

C 

the number of modes of class 1, where J] Cf = m. By defining a criterion, 

1*1 

the class label assignment to the modes that maximizes the criterion can be 
chosen as the optimal assignment. Let Pj(X) be the density function of class 
i, Pij(X) be the density function of mode j of class i, q-jj be the a priori 
probability of mode j of class i, q-j be the a priori probability of class i, 
and p(X) be the mixture density function. Then we have the following 
relationships. 
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Choosing the. likelihood function as a criterion, the likelihood of occurrence 
of given patterns with their corresponding labels can be expressed by the 
quantity L ' , where 
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Since logarithm Is a monotonic function of Its argument, taking logarithm of 
equation (2-2) results in 

c /"< j c t \\ r. 

L " l03(L,) ■ ,os |S "u P M B|UHjj + (2 " 31 

The a priori probability of mode j of class i, q^j, may be estimated as 
follows. For a particular labeling assignment, let the modes 1, 2, •••, 
belong to class 1. Then 


q 1j " ’Cj— (2-4) 

S 4 - 

If the clustering algorithms generate a relatively fewer number of clusters 
(in remote sensing, typically around 12), optimal class label assignment for 
the clusters can easily be obtained by exhaustive search. By giving all 
possible class label assignments to clusters and computing the value of the 
criterion for each assignment, the optimal class label assignment can be 
chosen as the one that extremizes the criterion. If the density functions of 
the modes are Gaussion, the criterion takes a relatively simple form (for 
example, if a clustering algorithm based on the maximum likelihood equations 
(ref. 2) is used to fit the Gaussion density functions for the modes). 

2.1 CASE IN WHICH THE NUMBER OF MODES IS EQUAL TO THE NUMBER OF CLASSES 
AND THE MODE-CONDITIONAL DENSITIES ARE GAUSSIAN 

Consider a simple case in which the number of classes is equal to the number 
of modes and the mode-conditional densities are Gaussian. That is 

p(X|fl = A) ~ N(M a , 
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where d refers to a particular assignment of class labels. The probabilities 
and are computed according to equation (2-4), for this label assign- 
ment. Equation (2-8) can )e used as a criterion. However, for Gaussian den- 
sities, a simple criterion can be obtained us'ng the fact that the logarithm 
is a convex upward function to derive a lower bound on L. Since logarithm Is 
a convex upward function, we have the Inequality 


log 



a j g 1 (X) 



(2-9) 


where 



and a i > 0 ; 1 = 1,2, ••♦,C j 

Let the density function of the ^ mode of class i be 

using equations (2-9) to (2-11) in equation (2-8) yields 

L > -C \ log(2ir ) - £ L 2 


( 2 - 10 ) 


( 2 - 11 ) 


(2-12) 





Thus, the optimal cla.^s label assignment can be chosen as the one that mini- 
mizes t.2' Combinatorial algorithms (ref. 8) can be used to efficiently gener- 
ate all possible class label assignments for exhaustive search. 


3. PROBABILISTIC CLUSTER LABELING BASED ON MAXIMUM LIKELIHOOD CRITERION 


The last section addressed the problem of obtaining class labels to the 
clusters by exhaustive search. This section considers the problem of 
obtaining a probabilistic description for the class labels of the clusters. 
The criterion used is the likelihood function, but normalized as shown in 
equation (3-14). 


c 

n ' 

L ' „ 1=1 

f "i 

\A 

1 

p[X.(j), u.(j) = i]j 

► 

1 


c 

n 

i=i 

(n pCUjnl 
("i ! 


, c ( 
* l - n '» 
i=i ( 

n t 

n 

j=i 

p[X i (j), u» i (j) = i] j 

| (3-1) 

pDrjTTD J 


The mixture density p(X) can be written in terms of class-conditional den- 
sities as 


P(X) = P(o> = i)p(X|w = i) 


(3-2) 


The mixture density p(X) can also be written in terms of mode-conditional 
densities as 


p(x) = E p ( n = *)p(x|n = i) 
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(3-3) 





On comparing equations (3-2) and (3-3), the following assumption is made. 



tjtu * i)p(Xjn = z) 


(3-4) 


Equation (3-4) can be rewritten as 


m n-' 

p(w « 1 1X) = P(w « i|n « A)p(n = £|X) 1 ' 

Since logarithm is a monotonic function of its argument, taking logarithm of 
L 1 of equation (3-1) and using equation (3-5) yields the following: 


C ' 1 m 

L = log(L') = £ £ log E «jhPCo = *|X 1 (J)D}. (3-6) 

i=l j=l (£=1 ) 

where = P(<u = i|n = z) is the probabilitj _hat the label of model l 
is class i. The probabilities o^. satisfy the constraints given in equa- 
tion (3-7). 


a 


£i 



i = 1,2,»*»,C and Z = 1 , 2 , * • • ,m 
= 1 ; Z = 1,2, • • • ,m 


i 


(3-7) 


Closed form solutions for a^, by minimizing L of equation (3-6), subject to 
the constraints of equation (3-7), seem to be difficult. The probabili- 
ties o tf . can easily be obtained using optimization techniques such as Davidon- 
Fletcher-Powell (refs. 9, 10, 11). 

3.1 FIXED-POINT ITERATION SCHEME FOR OPTIMAL a._. 

■ — — £1 

The following fixed-point iteration equation [similar to maximum likelihood 
equations in parametric clustering (ref. 2)] for the solution of the above 
optimization problem can easily be obtained by introducing lagrangian 
multi pi iers. 


(3-8) 
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where <**ij ■ U 
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(3-9) 


Lennington (ref. 12) derived fixed-point iteration equations similar to equa- 
tion (3-8) and Heydorn (ref. 13) developed a least squares solution for he 
probabilities a^. . However, closed form solutions for a^. can be obtained 
with the criterion as the maximization of a lower bound on L. Using the 
inequality (2-9) in equation (3-6), a lower bound on the log likelihood 
function L can be obtained as > 

L > jZJ jt Ejpni ■ (3 . 10 


Introducing the lagrangian multipliers, the probabilities a^. that maximize 
the lower bound of equation (3-10), subject to the constraints of equation 
(3-7), can easily be shown to be 
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N i 

where e. £ « -J- £ p[n = *IX^ ( j) 3 (3-12) 

This solution simply states that the probability of the i -t ^ class label for a 
given cluster % is the ratio of the a posteriori probability of cluster l 
given the labeled patterns from class i to the sum over all classes of the 



a posteriori probability of cluster i given the labeled patterns from each 
class. Having obtained a^; q., the proportion of class i, can be estimated 
as follows. Consider 


m m 

di = g p <“> = M = Xj 


Hence, q. , the estimate of q^ , can be computed from the following. 
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3.2 EXPRESSIONS FOR THE ASYMPTOTIC MEAN AND VARIANCE OF PROPORTION 
ESTIMATE 

One of the objectives in the processing of remotely sensed aerospace imagery 
data is the estimation of proportion of the crop of interest. Assuming 6„ is 
constant, expressions are developed in this section for the asymptotic mean 

A A 

and variance of proportion estimate q^ . The expected value of can be 
written as 
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The delta method (ref. 14) can be used to compute the asymptotic variance of 

A A 

q^ . This involves expanding q.. in a Taylor series around its true value 
m 

d-j = 2Z <^ 0 ^ . • The result of the expansion is 
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The covariance of the estimates a u1 - and can be expressed as 
c °v(a ui a vi ) « EC[a ui - E(o ui )]Ca y1 - E(a yi )]} 

The estimates (a u1 ) are functions of the given labeled patterns X r ( j ) ; j a 1, 
2, ♦ N r ; r = 1, 2, C. Let the mean of X r (j) be w r . Expanding a ui in 
Taylor's series around X r (j) 3 u r ; j = 1, 2, •••, N r ; r « 1, 2, C and 
retaining only first order terms yields 
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Thus, to a first order approximation 
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Similar to equation (3-17), expanding cy in Taylor's series around 
X (j) = iy and retaining only first order terms yields 
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Assuming the patterns are independent and using equations (3-17) and (3-19) in 
equation (3-16) results in 
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Differentiation of equation (3-11) yields 
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Let “ki = “ki ( X r^ = M r ; s=1 » 2 »***» N r 5 r=l,2,--*,C) 
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If the mode-conditional densities are Gaussian, 0^- can easily be computed 
from the following. Consider 
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Using equations (3-22) and (3-23) in equation (3-20) yields 
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The variance of q. can then be computed from equations (3-15) and (3-26), 
using class sample means and covariance matrices for u r and S r - 



4. CLUSTER LABELING BASED ON THE CRITERION OF 
PROBABILITY OF CORRECT LABELING 

If the class-conditional densities are known, the a posteriori probabilities 
of the classes can be expressed as a function of pattern X and a priori 
probabilities. Since the label of the pattern X^(j) is 1, for particular 
class-conditional densities and a priori probabilities, p[w = i I X ^ ( j ) 3 is the 
probability with which the pattern Xj(j) is correctly recognized. Let p^ be 
the probability that the pattern comes from class i and belongs to class 1 for 
particular class-conditional densities and a priori probabilities. Similarly, 
let p lJt be the probability with which the pattern comes from class i and 
belongs to class i. Then, these probabilities can be expressed as 

P ii = PU = 1) / pU = i | X ) p( X | a) = i)dX 

= Pfu) = i)E[pU » i IX)] (4-1) 

p(X|ai=i) 


and p. « P(u) = i)ECp(a) = *|X)3 (4-2) 

U p(XU=i) 

The probability of correct labeling or the probability with which a pattern 
comes from a particular class and belongs to the same class is 


P 


S 



n 


(4-3) 


The error probability or the probability with which the pattern comes from a 
particular class and belongs to some other class is 



ui 

From equations (4-1) to (4-4), it is easily seen that 


(4-4) 


p S + P e 


= 1 



(4-5) 


It Is observed that equations (4-1) and (4-3) are based on treating the a 
posteriori probabilities of the classes as continuous variables and differ 
from the usual estimates based on the counts. The probability P5 can be 
estimated from the given, labeled pattern set as follows. 


A 



■ £ v, 

1=1 1 1 


(4-6) 


where 

and 


N i 

V Z i\(j) 

1 N i j=l 1 
r. (j) = p[w = i |X. 



(4-7) 


The following analysis shows that the estimate for P5 of equation (4-6) has 
less variance than the estimate based on counting the classification errors. 
The estimate of equation (4-6) is unbiased. That is 


E(P S ) = £ q,E(i t ) - £ q,u, • P s 


(4-8) 


Assuming the patterns are independent, an expression for the variance of P5 
can be obtained as follows. Consider 


Var(P $ ) a e 
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■ £ qf^E'(E([r,(j)] 2 )-u?) 

1*1 1 NT j-1 ' 1 1 > 


(4-9) 


But, we have the relationship 


0 < r.(j) < 1 



(4-10) 


Using equation (4-10) In equation (4-9) yields 




q 1 'RJ- u 1 (1 " u i> 


(4-11) 


Hence, the variance of P s is less than the variance of the estimate based on 
counts of correctly classified patterns (ref. 15). The criterion of either 
the maximization of or the minimization of P g can be used to obtain proba- 
bilistic class label assignment for the clusters. Using che relationship of 
equation (3-4) between the class-conditional densities and the cluster- 
conditional densities in terms of probabi'l ities a & . in equation ^4-6) results 
in 


P 



- « l S £ 


1 m 


1 J=1 4=1 




UXjU)] 


s 


C m 


(4-12) 


N i 

where e^ = ^7 X) p[n = a|X..(j)] ( 4_13 ) 

1 J"1 

The probabilities q.. and . are related as follows. 


m 


i = 1,2,* • • ,C 


(4-14) 


Now the problem of estimation of proportions and the probabilities of class 
labels for the clusters can be formulated as follows. 


Find: q^, 1 » 1, 2, •••, C and ; 1 * 1, 2, C; z * 1, 2, •••, m 
such that P s Is maximized where 



a £i e U 


Subject to the constraints 


and 



Jri = q i 5 1 * 

> 0 *, 1 = 1,2, • • • ,C 

> 0 ’> i - 1>2,”* ,C » l - 1,2, 



(4-15) 


(4-16) 


Comparing equations (3-6) and (4-15), it is seen that q^ is now directly 
entered into the problem. Optimization techniques such as Davidon-Fletcher- 
Powell (refs. 9, 10, 11) can easily be used to solve the above problem. 


5. USE OF LABELED AND UNLABELED PATTERNS FOR 
PROBABILISTIC CIUSTER LABELING 


One of the Important objectives In ,he processing of remotely sensed imagery 
data is to estimate the proportions of classes of interest. Ideally, these 
estimates should be unbiased and of minimum variance. It is the purpose of 
this section to develop a scheme that uses both the given labeled and 
unlabeled patterns for obtaining the probabilities of class labels for the 
clusters by minimizing the variance of the proportion estimates. It is 
assumed that we are given a set of labeled patterns X^(j)eu^(j) *» i; 
j » 1, 2, ••*, N^; i » 1, 2, •••, C and a set of unlabeled patterns 

1 “ 1, 2, ••• , N u . Let Np be the total number of labeled and unlabeled 
patterns. That is 


N T “ £ N 1 * N u 

Let Yj, i - 1, 2, ~ ~ ~ > N-j- be the given labeled and unlabeled patterns. Let 
the Bayes classifier be used to classify the given labeled and unlabeled pat- 
tern sets. For particular class-conditional densities and a priori probabil- 
ities, let the resulting confusion matrix of a given labeled pattern set and 
the classification matrix of an unlabeled pattern set be as shown in table 1. 

Let to be the given label and be the classifier label. Let 

* P(u> » 1 |to c = j) be the probability that the true label is i, given that 
the classifier label is j. Let = P{w « i, = j) be the probability that 
the true label of the pattern is i and the classifier label is j. Let 
P c (i) = P (a) c * i) be the probability that the classifier classifies a pattern 
into class i and = P{u> = i) be the a priori probability of class i. Then 
we obtain 

Pjj = P(u = i ,« c = j) 

= P(«u c = j)P(oi = i |w c = j) 

= P C ( J ) A i j 





(5-1) 


TABLE 1.- CLASSIFICATIONS OF LABFLED AND UNLABELED PATTERN SETS 
(<a) Confusion matrix of labeled pattern set 


True label 

Classifier label 

Number belonging 
to each class 

1 

2 

« • • 

C 

1 

n ll 

n 12 

* * * 

n lC 

"i. ■ N i 

2 

n 21 

n 22 

* • • 

n 2C 

n 2. * h 

• 

♦ 



• 

• 

• 

• 

• 


• 

• 

C 

n Cl 

n C2 

• • • 

n CC 

"c. ■ N c 

Number classified 
into each class 

n .l 

n .2 

• • • 

n .c 

n * n 

• » 


(b) Matrix of classifications of unlabeled set 


Classifier label 

1 

2 

• • • 

C 

v i 


• • • 

V C 


where 


n.. * number of labeled patterns for which the true or 

J given label is i and the classifier label is j 


C = number of classes 


* 1 . 


C 

- £ n -j j 

j=l 

c 

- 2 "ij 
1=1 1J 


C C 

n = n = E E n,., the total number of labeled patterns 
*• 1=1 j=l 1J 

V. = number of unlabeled patterns for which the 
J classifier label is j 




Since each classification is independent, the likelihood function of the 
observed n's and V's can be written as 


C C ni i C V j+n s 

L - k n n (x„) n [p c (J)3 j 

1»i j-i j«l ^ 


(5-2) 


where K is a coftjtant. The values of P(j(j) and X^j which maximize L, sub 
ject to the probability constraints on Pq(j) and X^j, can be shown to be 
(refs. 16, 17) 



c v ' c 

S (n -* + Vt> 

(5-3) 

and 

U ".j 

(5-4) 


Ar. estimate q.j for the proportion may be obtained as follows. 


q< * PU ■ i) • £ PU = 1,uu ** J) * jl p c (j)x ij 
1 Fi ° j=l J 


From equations (5-3) through (5-5), the following is obtained. 



(5-6) 


The estimate of equation (5-6) can be interpreted as follows. The ratio 
( n ij/n.j) gives the proportion of the patterns truly belonging to class i of 
the patterns classified into class j. Multiplying this ratio by (n • + V.) 

• J J 

and summing it from 1 to C gives an estimate of the patterns of class i in the 
patterns classified into all the classes. Dividing this by the total number 
of patterns gives an estimate for the proportion of class i. 





It can be shown (refs. 16, 17) that the estimate of equation (5-6) is 
asymptotically unbiased and its asymptotic variance is given by the following. 

C \.Al - X..)P c (j) C P r (j)X?. - q* 

Var(q ) = E -1J JL-L. + £ ,C tJ ... i (5-7) 

1 J=1 n j=l N u 


= ^''\ + (V*) j?i P c uu 


2 

ij 


(5-8) 


For particular a priori probabilities and class-conditional densities, the 
probabilities P c (j) and x.j may be expressed as 


and 


P c (j) = / p(u> = j | X ) p ( X ) aX 
X.. = P(w = i |w r = j) 

ij w 


P(w = i ,u c = j) 

~tr ~ 

£ P(w = i ,u r = j) 
i=i 0 

q i / p ( to = j | X ) p ( X i to = i )dX 

~c 

£ q- / p(w = J I X ) p ( X | oi = i ) dX 
i=l 1 


(5-9) 


(5-10) 


Using the given labeled and unlabeled patterns, estimates forP c (j) and X- . can 
be obtained from equations (3-5), (5-9), and (5-10) as follows: 


1,1 

*C«> pU = JI V = “sj e us 


(5-11) 


e us = KjT ^ p(n ” s| V 



2 -^ 


where 


and 



III 

q i a sj e i: 
C m 

£ q^ S ° S i e 


r=l 


s=l 


sj rs 


Using equations (5-11) and (5-12) in equation (5-5) yields 

q,‘ = £ P c (j)*ii ; 1 = 1 , 2 , — ,C 

j=l IJ 


(5-12) 


(5-13) 


Let S be the sum of asymptotic variances of proportion estimates. From 
equations (5-8), (5-11). and (5-12). the following estimate for S is obtained. 
- C 

S - £ Var (q.) 
i=l 



Now the problem of obtaining probabilistic class label assignment for the 
clusters may be formulated as follows. 

* 

Find: a^; s = l,2,***,m; j = 1,2,*-* ,C; and q ^ , i = 1,2, ••*,C such that S of 
equation (5-14) is minimized, subject to the constraints of equations (4-16) 
and (5-13). 


Optimization techniques such as Davidon-Fletcher-Powell (refs. 9, 10, 11) can 
easily be used to solve the above optimization problem. 


6. FORMULATION WITH LABEL IMPERFECTIONS OF THE GIVEN LABELED SET 


In practice, such as in the classification of remotely sensed, multi spectral 
scanner imagery data, it is difficult and expensive to obtain labels for the 
pattern set. The labels for the patterns are usually provided by an analyst 
Interpreter on examining imagery films and using some other information such 
as historic information, crop calendar models, etc. These labels are very 
often imperfect. Recently, Chittineni (refs. 16, 18, 19) investigated tech- 
niques for estimating the probabilities of label imperfections. Once the 
probabilities of label imperfections are known, these can be used in obtaining 
the class labels to the clusters through their densities and proportions. Let 
w and id 1 be the perfect and imperfect labels, respectively, each of which 
take values .1, 2, •••, C. The imperfections in the labels are described by 
the probabilities 


0 . . = P(«' = i | co = j) 
J * 

C 

where £ 0.. = 1 

i=l 


(6-1) 

( 6 - 2 ) 


The a priori probabilities, the class-conditional densities, and the a poster- 
iori probabilities of the classes with and without imperfections in the labels 
are related in terms of probabilities of label imperfections as (ref. 18) 
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P(u)' = i) 

11 

B^PU = j) 



j=i 


(6-3) 
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= i ) p ( X | co * = i) 

= E 

j=i 

Bj.jP(w = j)p(XU = j ) 

(6-4) 


c 



X 

II 

3 

CL 

- E 

j=i 

e^pU = j IX) 

(6-5) 


ai 


where it is assumed that 


p(X|u = j) = p(X|io' = i , w - j) 


( 6 - 6 ) 


Let 6 be the matrix of probabilities of label imperfections where 


8 ■ 


(6-7) 


ut v . (a T ) _1 

Using equations (6-7) and (6-8) and inverting equation (6-4) results in 


( 6 - 8 ) 


PU = i ) p ( X 1 to » l) * Yj . P ( tu' = j)p(XU‘ = j) 

j=l 1J 


(6-9) 


Using these relationships, the criteria developed in the previous sections for 
labeling the clusters can be reformulated to take into account the 
imperfections in the labels, once p.. are known or estimated. In the follow- 

J * 

ing, it is assumed that the probabilities of label imperfections 3- are 
available. 


6.1 MAXIMUM LIKELIHOOD CRITERION WITH IMPERFECTIONS IN THE LABELS 

The log of likelihood function of equation (3-1) with imperfections in the 
labels can be written as 


L = £ /Z log {p[wj(j) = i j X - ( j ) ]} 
i=l j=l 1 1 


(6-10) 




Using equations (3-5) and (6-5) in equation (6-10) yields 
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N i 
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L - E 

E log 

£ 

= £|X. (j)] 

i=l 

j=i 

II 
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£ 

N i 

( c 

m 

- E 

i=l 

E log 

j=l 

E 

E a rt B ti p ^ 7 = r 
r-i 


r|X.(j)3 


(6-11) 


For a given the problem of estimating a rJl can be formulated as follows. 
Find: r = 1, 2, •••, m; l - 1, 2, •••, C such that L of equation (6-11) 
is maximized, subject to the constraints of equation (3-7). 


Closed form solutions for the above optimization problem seem to be dif- 
ficult. However, the following fixed point iteration scheme, similar to equa- 
tion (3-8), can easily be obtained by introducing the lograngian multipliers. 
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N i 
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i=l i=l j=l 
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( 6 - 12 ) 


where 


a^e^ptn = r|X,(j)] 
^ijtr C in 

E E 0 ■ pCn = k | X - ( j ) 3 

s=l k=l xs S1 1 


(6-13) 


Also, optimization methods such as Davidon-Fl etcher-Powel 1 (refs. 9, 10, 11) 
can easily be used to solve the above optimization problem. 


6.2 THE CRITERION OF PROBABILITY OF CORRECT LABELING WITH LABEL IMPERFECTIONS 


In section 4, the probability of correct labeling is proposed as a criterion 
for obtaining the probabilities of class labels for the clusters. From 
equations (4-1), (4-3), and (6-9), we obtain 

C 

Pr = £ / Cp ( oj = i|X)][PU = i)p(Xlto = i)]dx 
5 i=l 


= S V. .P( O)' = j) / p(u! = ilX)p(Xlu)' = j)dx (6-14) 

i=l j=l 1J 

An estimate for P 5 can be obtained in terms of given, imperfectly labeled pat- 
terns as 
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p s = E E v ij p <“ 

* i=l j=l 
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(6-15) 


Using equations (3-5) and (6-3) in equation (6-15) yields 

- C C I, m { 
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(6-16) 
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where 


(6-17) 


•••, C such 


Now the problem can be formulated as follows. 

Find: a . ; i = 1, 2, ••• , C; r » 1, 2, *•, m and q , s = 1, 2, 

A II ^ 

that P s of equation (6-16) is maximized, subject to the constraints of 
equation (4-16). 

Optimization techniques such as Davidon-Fletcher-Powell (refs. 9, 10, 11) can 
easily be used to solve the above optimization problem. 

6.3 THE CRITERION OF VARIANCE OF PROPORTION ESTIMATE WITH IMPERFECTIONS 
IN THE LABELS 


The probabilities of label imperfections can be taken into account in 
estimating the probabilities of class labels to the modes through the proba- 
bilities A... Using equation (6-9) in equation (5-10) yields 

' J 

C 

E v^PU' = i) j p(u) = jlX)p(Xlu) 1 = £)dx 

X ij = -TT^C (6-18) 

E E V i 5 p ( u) ‘ ° A) / PU = j 1 X) p( X | OJ 1 a Jt)dX 
i=l % ->1 u 


An estimate for k.. can be obtained from the given, imperfectly labeled pat- 

’ J 

terns as 


c C { N* 

E E v i£ p (“' = *) -f E pC«° ° 

i=l *»l w s=l * 


(6-19) 


Using equations (6-3), (3-5), (6-19), and (5-11) in equation (5-8) yields a 
criterion similar to equation (5-14). 



7. EXPERIMENTAL RESULTS 


This section presents some results obtained in the processing of remotely 
sensed Landsat multispectra! scanner (MSS) Imagery data. The objective of the 
processing is to estimate the proportion of class of interest in each image. 
Several segments were processed in the following manner. (A segment is a 9 by 
11 kilometer or 5 by 6 nautical mile area for which the MSS image is divided 
into a rectangular array of pixels, 117 rows by 196 columns.) The image is 
overlaid with a rectangular grid of 209 grid intersections. Ground truth 
labels or true labels of the pixels, or dots corresponding to each grid inter- 
section are acquired. Also, for a subset of the pixels of 209 grid intersec- 
tions, the labe s are provided by analyst interpreter (AI) by examining the 
imagery films and using information such as historic information, crop calen- 
der models, etc. These are imperfect labels. There are two classes in the 
image. Class 1 is wheat and class 2 is nonwheat, designated "other." The 
class of interest is wheat. 

Several acquisitions are used for each segment. The number of features or the 
number of channels used for each segment is listed in table 2. The Gaussian 
mode-conditional densities and a priori probabilities of the inherent modes in 
the data of each segment are obtained using maximum likelihood clustering 
algorithm (refs. 2, 20). The number of modes generated for each segment is 
listed in table 2. The probabilities of label imperfections of AI labels 
or p matrix are estimated for each segment and are listed in table 2. The 
theory developed in section 3 is applied in estimating the probabilities of 
class labels to the modes of each segment using AI labeled patterns and using 
ground truth labeled patterns. The proportion of class 1, the class of 
interest, is estimated for each segment using equation (3-13) and listed in 
table 2. 

The theory developed in section 6.1 is used with the AI labeled patterns and 
the corresponding p matrix in estimating the probabi 1 ities of class labels to 
the modes. Equation (3-13) is then used in estimating the proportion of class 
1 for each segment and the results are listed in table 2. 


TABLE 2.- ESTIMATION OF PROPORTION OF CLASS 1 WITH MAXIMUM LIKELIHOOD CRITERION 




From table 2, it is observed that the estimates obtained with the close:! form 
solution of equation (3-11) for the probabilities of class labels to the -tedes 
are in close agreement with the ones obtained using the fixed point iteration 
scheme. Better proportion estimates are obtained by taking the '^perfections 
in the AI labels into account through the £ matrix instead of estimating the 
proportions directly using AI labeled patterns. 

The estimated proportions of class 1 by exhaustive search using maximum 
likelihood criterion and maximization of probability of correct labeling cri- 
terion with both the AI and ground truth labels are listed in table 3. 

The estimated proportions of class 1 from the given, randomly labeled patterns 
are listed in table 4 for all the processed segments. On comparison of 
tables 2, 3, and 4, it is seen that there is improvement in the estimates 
through machine processing. 

For all the segments, the method developed in reference 19 is used to estimate 
the probabilities of label imperfections of AI labels. The number of labeled 
patterns used for each segment is listed in table 5 and the number of 
unlabeled patterns used is 836. The estimated labeling accuracies and the 
proportion estimates obtained using maximum likelihood criterion to label the 
clusters with these probabilities of label imperfections are listed in table 
5. From table 5, it is seen that there is considerable improvement in th • 
proportion estimates with the use of estimated 3-matrix over that obtained 
directly using imperfectly labeled patterns. 



TABLE 3.- ESTIMATION OF PROPORTION OF f ASS 1 BY EXHAUSTIVE SEARCH 
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TABLE 4.- ESTIMATION OF PROPORTION OF CLASS 1 
FROM RANDOMLY LABELED PATTERNS 


Segment 

Number of 
labeled 
patterns 

Proportion of class 1 estimated 
from labeled patterns 

6T 

proportion 

AI labels 



oT labels 

a 1005 

t 

97 

0.2061 

0.3368 

0.348 

a 1060 

106 

0.1604 

0.2830 

0.231 

a 1231 

96 

0.7395 

0.7604 

0.744 

b 1520 

91 

0.2197 

0.2637 

0.301 

b 1604 

101 

0.3069 

0.4950 

0.524 

b 1675 

107 

0.0934 

0.2897 

0.291 

b 1805 

144 

0.1042 

0.1389 

0.164 

a 1853 

91 

0.2637 

0.2857 

0.306 

b i899 

95 

•0.6316 

0.6484 

0.596 

Bias 


0 ,8661 IE— 01 

0.37778E-03 


MSE 


0.13840E-01 

0.10134E-02 



Segments in which class 1 Is winter wheat. 
^Segments in which class 1 Is 'spring wheat. 






TABLE 5.- ESTIMATED LABELING ERRORS AND PROPORTION ESTIMATES 


Segment 

Location 

Humber of 
AI labeled 
patterns 

Estimated B-matrix 
using method 
developed in 
reference IS 

Computed B-matrix 
comparing AI and 
GT labels 

Pt using 
6 of 
column 3 

Pi directly 
with AI 
labels 

GT 

proportion 



Wheat 

Other 








1005 

Sherman, 

Texas 

20 

77 

[0.8284 

L0.0165 

0.17161 

0.9835J 

[0.5455 

[0.0303 

0.45451 

0.9692J 

0.2723 

0.2456 

0.348 

1060 

Cheyenne, 

Colorado 

17 

89 

[0.5732 

LQ.0431 

0.42681 

0.9569J 

[0.5667 

10.0293 

0.43331 

0.9737J 

0.2173 

0.1975 

0.231 

1231 

Jackson, 
Okl ahocna 

71 

25 

[0.9586 

L0.1330 

0.04141 

0.8670J 

[0.9315 

[0.1304 

0.06851 

0.8696J 

0.7057 

0.6265 

0.744 

1520 

Big Stone, 
Montana 

20 

71 

[0.8629 

[0.0363 

0.13711 

0.9637J 

[0.7917 

[0.0149 

0.20331 

0.9851J 

0.2154 

0.2109 

0.301 

1604 

Renville, 
N. Oakota 

31 

70 

[0.6155 

lo.oooo 

0.38451 

1.0000J 

[0.4600 

[0.1569 

0.5400] 

0.8431J 

0.3496 

0.2962 

0.524 

1675 

Mcpherson, 
S. Dakota 

10 

97 

[0.5481 

[0.0005 

0.45191 

0.9995J 

[0.2667 

[0.0390 

0.7333] 

0.961QJ 

0.1932 

0.1035 

0.291 

1805 

Gregory, 
Sr Dakota 

15 

129 

[0.5227 
[0 .0626 

0.47731 
0.9374 J 

[0.4211 

[0.0640 

0.5789] 

0.9360J 

0.1757 

0.1181 

0.164 

1853 

Hess, 

Kansas 

24 

67 

[0.6342 

10.0027 

0.36581 

0.9973J 

[0.8077 

[0.0515 

0.1923] 

0.93S5J 

0.3563 

0.3246 

0.306 

1899 

Walsh, 

N. Dakota 

60 

35 

[0.9972 

[0.0356 

0.00281 

0.9644J 

[0.8544 

[0.2500 

0.1356] 

0.7500J 

0.6216 

0.6282 

0.596 

Bias 








.4421E-01 

•832E-01 


MSE 








•6446E-02 

•13575E-01 




8. CONCLUDING SUMMARY 


In the classification of imagery data such as in the machine processing of 
remotely sensed multispectral scanner data, unsupervised classification 
techniques have been found to be effective. Clustering techniques break up 
the image into its inherent modes. One of the crucial problems in the machine 
classification of imagery data is to label these clusters. 

This paper addressed the problem of labeling the modes and proposed various 
techniques. It is assumed that the a priori probabilities of the modes and 
the mode-conditional probability densities are available. It is also assumed 
that a set of labeled patterns from the classes of the data and a set of 
unlabeled patterns are given. The labels of these patterns might be 
imperfect. 

Using the given, labeled pattern set, the problem of assigning the class 
labels to the modes is formulated as a combinatorial problem. If the number 
of modes is small, best assignment of class labels to the modes can easily be 
obtained by exhaustive search, using criterion such as maximum likelihood. 

The problem is also formulated as that of obtaining probabilistic class label 
assignment to the modes using maximization of either the likelihood function 
or the probability of correct labeling as a criterion. Closed form solution 
is obtained for the probabilities of class labels to the modes with the 
maximization of lower bound on the likelihood function as a criterion. With 
this solution and using the Taylor series expansion, expressions are developed 
for the asymptot'.c mean and variance of the proportion estimates of the 
classes. In the processing of remotely sensed data, one of the important 
objectives is to estimate the proportion of class of interest. Using the 
given, labeled and unlabeled patterns, the problem of obtaining class labels 
to the modes is formulated as that of minimizing the variance of the propor- 
tion estimates of the classes. 


The criteria of the maximum likelihood, maximization of probability of correct 
labeling, and minimization of variance of proportion estimates are 
reformulated to take into account label imperfections in the given, labeled 
set for known probabilities of label imperfections. 

In practice, it is often of interest to group the modes into their natural 
classes. A procedure that uses unlabeled patterns based on probability of 
error as a criterion is proposed for grouping the modes into their natural 
classes. Also, the problem of proportion estimation through cluster labeling 
with impure clusters is addressed. 

Furthermore, experimental results in the processing of remotely sensed 
multispectral scanner imagery data are presented. 
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APPENDIX A 

USE OF UNLABELED PATTERNS FOR ASSIGNING MODES TO THEIR NATURAL CLASSES 


In practice, it is often of interest to group the modes generated by the 
clustering algorithm into their natural classes. An approach is proposed in 
this appendix for grouping the clusters into their respective classes based on 
Bayes probability of error as a criterion and using unlabeled patterns. It is 
assumed that a set of unlabeled patterns X., i = 1, 2, ••*, N is given, and 
the number of classes C is given. 


The Bayes classifier classifies a pattern X into a class whose a posteriori 
probability is largest. The conditional probability of error when X is 
classified according to the Bayes decision rule is 


r(X) = 1 - maxCpU = iiX)] (A-l) 

i 

The Bayes probability of error is then given by 

P e = E[r(X)3 = / r(X)p(X)dx ( A -2) 

Thus, if r(X) is known as a function of X, the Bayes .probability of error P e 
can be estimated by the sample mean r(X 1 -) of N unlabeled patterns as 



r(X.) 


(A-3) 


The estimate of equation (A-3) is unbiased and its variance is given by 
(ref. 21) 


Var{P e ) = 


E[r (X) j - P| 
R 


p e (l - P e > 


P e 


(A-4) 


The variance of P„ is at least -~rr less than the variance of the error estimate 

e CN p e d - p e >' 

based on counting misclassified labeled test patterns, ^ . 


Using equations (3-5) and (A-l) in equation (A-3), an estimate of Bayes error 
probability can be written as 



m 




(A-5) 


Now the problem can be formulated as follows. 

* 

Find: a^i k = and St = 1 ,2 , • - * ,m such that P g of equation (A-5) is 

minimized, subject to the constraints of equation (3-7). 


Optimization techniques such as Oavidon-Fletcher-Powell (refs. 9, 10, 11) can 
easily be used to solve the above optimization problem. 


APPENDIX B 

CLUSTER LABELING WITH A SET OF LABELED PATTERNS FROM A 
SINGLE CLASS AND A SET OF UNLABELED PATTERNS 

In practice, the situation often arises in which the patterns of one class are 
easily and accurately labeled compared to the patterns of another class. For 
example, in remote sensing, it is easier to label the pixels of another class 
(ref. 22 ) and accuracy of labeling is higher for these pixels compared to the 
pixels of the wheat class. This appendix formulates the problem of obtaining 
class labels to the clusters using information from a given set of labeled 
patterns from a single class and a set of unlabeled patterns. It is assumed 
that there are only two classes. The probability of correct labeling is used 
as a criterion. Let X^S,), i = 1, 2, •••, N^ be the given patterns from class 
1 and , i = 1, 2, N u be the given set of unlabeled patterns. From 
equations (4-1) and (4-2), we get 

2 

Pc = E P(u = i) / p(o» = i I X ) p ( X | to = i)dX 

5 i=l 

= q 1 / [p(u = 1 | X ) - p(w = 2|X)]p(X|u = 1 ) dX 


+ / p(u> = 2 j X ) p (X ) dX (B-l) 

The probability P 5 can be estimated using the given set of labeled and 
unlabeled patterns is as follows. 


A 




{p[w = IIXjU)] - p[w = 2|X 1 («)3> 


+ 


1_ 

N. 




(B-2) 


Using equation (3-5) in equation (B-2) yields 


a m in 

P S * q l ^ | a rl " VKr + ^ a r2 e ur 


(B-3) 


where ei r is given by equation (4-13) and e ur is given by the following. 

e ur " ^ g P(n ■ r|Y,) (B-4) 


N^w the problem of estimating the probabilities of class labels to the clus- 
ters can be formulated as follows. 

A 

Find and a^. where r = 1 ,2, • • • , m and i = 1 ,2 such that P $ is maximized, 
subject to the constraints 

xL ct - = 1 ; r = l,2,***,m 

i=l n 

m 

§ “rt V ’ <1 
0 < < 1 

a ri > 0 ; i = 1, 2 and r = 1,2, •••,m 

Optimization techniques such as Davidon-Fl etcher-Powell (refs, 9, 10, 11) can 
easily be used to solve the above optimization problem. 




APPENDIX C 


FIXED POINT INTERATION SCHEMES FOR PROBABILISTIC CLUSTER LABELING WITH 
THE CRITERION OF PROBABILITY OF CORRECT LABELING 


Fixed point iteration schemes are presented in this appendix for obtaining the 
probabilities of class labels to the clusters, assuming that the a priori 
probabilities qj of the classes can be approximately estimated from the given 
labeled patterns. 

C.l THE LABELS OF THE GIVEN PATTERN SET ARE PERFECT 


Since logorithm is a monotonic function of its argument, taking log of 
equation (4-15) and using inequality (2-9) results in 


Pj = log (PJ 

/ C m \ 

■ ,og (S , 1 S G ‘«' ta ) 

C / m \ 

> £ "1 1 ° 9 ( t ? 1 “li e «) tC ' 1) 


A fixed point interation scheme for computing the probabilities of class 
labels to the clusters a^. , that maximize the lower bound of equation (C-l), 
subject to the constraints of equation (3-7), can easily be shown to be the 
following. 


Hi 


Hi 


C 

£ d t) 

i-1 ‘ 


H = l,2,***,m and i = 1,2,***,C 


(C-2) 


d 


Hi 


^ i a £ i ( 


i H 


where 


m 


(C-3) 


C.3 THE LABELS OF THE GIVEN PATTERN SET ARE IMPERFECT 


The equation (6-16) can be written in terms of a priori probabilities of the 
imperfectly labeled classes as 



a . v. .e . 
ri ij jr 


Taking log of equation (C-4) and using in equality (2-9) yields 


(C-4) 



(C-5) 


The probabilities o . that maximize the lower bound of equation (C-5) subject 
to the constraints of equation (3-7) can easily be shown to satisfy the 
following. 


® O -j -j 

= -j* ; i = l,2,***,m and i = 1 , 2 , • * - , C 


^ i ' 


where 


,• • i 


J £i ^ jn. .C, 


\i , .p , 

_LOi. 


(C-6) 


(C-7) 


5 5 a ri v 1j e jr 


C.3 EXPERIMENTAL RESULTS 


Some simulation results are presented in this section in estimating the 
proportion of the class of interest using the schemes of sections C.l and 
C.2. The a priori probabilities q^ in equation (C-3) are estimated from the 
given labeled patterns for use in the fixed point iteration scheme of equation 
(C-2). The same labeled patterns and the cluster statistics of section 7 are 
used. The proportion of the class of interest, class 1, is estimated using 
equation (3-13). The fixed point iteration scheme of equation (C-5) is used 
to obtain probabilities of class labels to the clusters by taking into account 
the imperfections in the labels. The proportion of the class of interest, 



class 1, Is estimated using equation (3-13). The results are listed in 
table C-l. From table C-l, it is seen that the better estimates are obtained 
by taking the imperfections in the labels into account. 


TABLE C-1 ESTIMATION OF PROPORTION OF CLASS 1 WITH THE CRITERION OF PROBABILITY OF ‘ORRECT LABELING 




APPENDIX 0 


PROPORTION ESTIMATION WITH IMPURE CLUSTERS 

Unsupervised classification or clustering algorithms are frequently usee in 
the estimation of proportion of classes of interest in the remotely sensed 
imagery data. Based on extensive practical studies, It is observed that the 
clusters produced by many clustering algorithms are Impure. That is, they 
contain more than one class. In the previous sections, approaches were pre- 
sented for proportion estimation through cluster labeling assuming the 
clusters are pure. 

This appendix considers the problem of estimating the proportion of classes of 
Interest with impure clusters, let a, fl, and 4> denote class, cluster, and 
mode, respectively. It is assumed that a set of labeled patterns 
X.(j) e u a i where j = 1,2,***,N^ and i = 1 , 2 , • • * , C are given. It is also 
assumed that the cluster proportions and the cluster densities are given. 
Probability of correct labeling is used as a criterion. An estimate of the 
probability of correct labeling for particular class-conditional densities can 
be written as 



(D-l) 


In the following, it is assumed that the number of inherent modes in the data 
is given, and for simplicity, it Is assumed to be equal to the number of 
clusters. Equation (3-5) can be written with respect to the modes as 


m 

p(u> = i ( X ) = £ P(w a i|$ = i) p($ * ijX) 
t=l 

Using equation (D-2) in equation (D-l) yields 



(D-2) 


(D-3) 




where 


and 


n P(w n 1j$ = A) 


e iA 3 


1 1 

jf £ pO * A|X i (r)3 i 


(0-4) 


1 r=l 


Similar to equation (3-4), a relationship between the cluster and class- 
conditional densities can be written as 


p(X|n * a) = jfj; p(X, w ■ i|n » a) 


£ p(X[w = i, n = a) p(u = i|n » a) 
M 

.C 

£ p(x(u> * i)P(w * i in » a) 
i=i 


(D-5) 


where it is assumed that 


p(X]u) = i) = p(X|w = 1 , fl = A) 


(D-6) 


The probabilities P(fl - A |oi = i) can be estimated using the given labeled pat- 
terns as follows. 

P(fl = a|« = i) = / p(n = A|X)p(X|w = i)dX 


, N i 

a ]f £ p[^ = a|X. ( r)] 
n i r=l 1 


. (D-7) 


From the Bayes rule we obtai n 


<Yi = p ( u ’ = i l n * *) = 7T p ( n = *1“ = i) 



(D-8) 


Let 


8 P(n 3 £|ai 3 i) 


(0-9) 


Using equations (0-8) and (D-9) In equation (0-5) yields 

p(n * A|X) » Xj, P(* * r|X) ^ jfj ^u^rij > 1 ° (0-10) 

For a given and estimated c l£ , inverting equation (0-10) yields 
p($ 8 r|X). The mode proportions ° P($ a i) can be estimated from 
p(* n r | X) as follows . 

<P Z 8 P(* = i) (0-11) 

= / P(* = A |X)p(X)dX 

Using the given patterns, f z can be estimated from equation (0-11), Nov/ the 
problem of estimating the proportions may be formulated as follows. 

Find: q^ and n^; i - l,2,***,m and i = 1 , 2 , • • * , C such that P s of equa- 

tion (0-3) is maximized subject to the constraints 

q. > 0 ; 1 » 1,2, ‘“.c \ 

n.i > ° ; * 8 1,2, • • • ,m 

1 i 8 1,2, • • * ,C / . 

£ q; 8 1 V (0-12) 

1=1 ( 

^2 n„j 3 1 ; A = l,2,**«,m 

i-1 

m 

£ a Pi 5 % 3 1,2, * * - ,m/ 

i-1 * ' * / 

Optimization techniques such as Davidon-Fletcher-Powell (refs. 9, 10, 11) can 

easily be used to solve the above optimization problem. 
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